6 research outputs found
CCIL: Continuity-based Data Augmentation for Corrective Imitation Learning
We present a new technique to enhance the robustness of imitation learning
methods by generating corrective data to account for compounding errors and
disturbances. While existing methods rely on interactive expert labeling,
additional offline datasets, or domain-specific invariances, our approach
requires minimal additional assumptions beyond access to expert data. The key
insight is to leverage local continuity in the environment dynamics to generate
corrective labels. Our method first constructs a dynamics model from the expert
demonstration, encouraging local Lipschitz continuity in the learned model. In
locally continuous regions, this model allows us to generate corrective labels
within the neighborhood of the demonstrations but beyond the actual set of
states and actions in the dataset. Training on this augmented data enhances the
agent's ability to recover from perturbations and deal with compounding errors.
We demonstrate the effectiveness of our generated labels through experiments in
a variety of robotics domains in simulation that have distinct forms of
continuity and discontinuity, including classic control problems, drone flying,
navigation with high-dimensional sensor observations, legged locomotion, and
tabletop manipulation
Cherry-Picking with Reinforcement Learning : Robust Dynamic Grasping in Unstable Conditions
Grasping small objects surrounded by unstable or non-rigid material plays a
crucial role in applications such as surgery, harvesting, construction,
disaster recovery, and assisted feeding. This task is especially difficult when
fine manipulation is required in the presence of sensor noise and perception
errors; errors inevitably trigger dynamic motion, which is challenging to model
precisely. Circumventing the difficulty to build accurate models for contacts
and dynamics, data-driven methods like reinforcement learning (RL) can optimize
task performance via trial and error, reducing the need for accurate models of
contacts and dynamics. Applying RL methods to real robots, however, has been
hindered by factors such as prohibitively high sample complexity or the high
training infrastructure cost for providing resets on hardware. This work
presents CherryBot, an RL system that uses chopsticks for fine manipulation
that surpasses human reactiveness for some dynamic grasping tasks. By
integrating imprecise simulators, suboptimal demonstrations and external state
estimation, we study how to make a real-world robot learning system sample
efficient and general while reducing the human effort required for supervision.
Our system shows continual improvement through 30 minutes of real-world
interaction: through reactive retry, it achieves an almost 100% success rate on
the demanding task of using chopsticks to grasp small objects swinging in the
air. We demonstrate the reactiveness, robustness and generalizability of
CherryBot to varying object shapes and dynamics (e.g., external disturbances
like wind and human perturbations). Videos are available at
https://goodcherrybot.github.io/
Behavioral Experiments in Email Filter Evasion
Despite decades of effort to combat spam, unwanted and even malicious emails, such as phish which aim to deceive recipients into disclosing sensitive information, still routinely find their way into one's mailbox.To be sure, email filters manage to stop a large fraction of spam emails from ever reaching users, but spammers and phishers have mastered the art of filter evasion, or manipulating the content of email messages to avoid being filtered.We present a unique behavioral experiment designed to study email filter evasion.Our experiment is framed in somewhat broader terms: given the widespread use of machine learning methods for distinguishing spam and non-spam, we investigate how human subjects manipulate a spam template to evade a classification-based filter.We find that adding a small amount of noise to a filter significantly reduces the ability of subjects to evade it, observing that noise does not merely have a short-term impact, but also degrades evasion performance in the longer term.Moreover, we find that greater coverage of an email template by the classifier (filter) features significantly increases the difficulty of evading it.This observation suggests that aggressive feature reduction — a common practice in applied machine learning — can actually facilitate evasion.In addition to the descriptive analysis of behavior, we develop a synthetic model of human evasion behavior which closely matches observed behavior and effectively replicates experimental findings in simulation